SOCQHBIT BSSOME 



ED 130 495 



PL 007 929 



lOTHOR 
TITLE 

INSTITUTION 



SPONS aCENCY 
REPORT NO 
PUB DATE 
6B&NT 
NOTE 

EDBS PBICE 
DESCRIPTORS 



IDENTIPIESS 



Perry, Jessica, Ed* ; Pietrzyk, Alfred, Ed* 
Preliminaries to the Design of LINCS Indexing Tools. 
LINCS Project Document series* 

Center for Applied Linguistics, Washington, D*C* 
Language Information Network and Clearinghouse 
System* 

National Science foundation, Washington, D*C* 

caLLINCS-69-15 

Jul 71 

NSP-6N-771 

55p* 

MF-$0*83 HC-$3*50 Plus Postage* 
Automatic Indexing: Classification; ^Indexing; 
♦Information Networks; Information Processing; 
♦Information Retrieval; Information Science; 
♦Information Systems; Lexicography; ^Linguistics; 
Search Strategies; subject Index Terms; ♦Thesauri; 
♦Vocabulary 
♦Language Sciences 



iBSTRfiCT 

The four chapters included in this report are based 
on LINCS project activities undertaken since 1968*^ with an emphasis 
on indexing tools in the language sciences and related problems* 
chapter one, "Indexing Tools for the Language Sciences: Hethodology," 
discusses the development of a LINCS thesaurus and its role in the 
LINCS network* Chapter two, "Vocabulary and Indexing for LINCS: some 
Preliminary Considerations," discusses LINCS indexing procedures* 
Chapter three, Preliminary classification for Language Sciences 
Information: working Outline, « discusses the requirements for a 
classification system which could constitute a framework for the 
LINCS thesaurus* Chapter four, "Vocabulary Control for the LINCS 
Reference Management system (RHS) ," summarizes the initial indexing 
approaches and authority file management techniques which, at this 
time, are considered to be optimal for use in the proposed Reference 
Management System (RHS), the automated central clearinghouse and 
secondary processing facility of LINCS* {&uthor/&H) 



♦ Documents acquired by £RIC include many informal unpublished ♦ 

♦ materials not available from other sources* ERIC makes every effort ♦ 

♦ to obtain the best copy available* Nevertheless, items of marginal * 

♦ reproducibility are often encountered and this affects the quality ♦ 

♦ of the microfiche and hardcopy reproductions eric makes available * 

♦ via the EBIC Document Reproduction service (EDBS)* EDRS is not ♦ 

♦ responsible for the quality of the original document* Reproductions ♦ 

♦ supplied by EDRS are the best that can be made from the original* 



ERIC 



CENTER FOR APPLIED LINGUISTICS 

LANGUAGE INFORMATION NETWORK AND CLEARINGHOUSE SYSTEM (LINGS) 



PRELIMINARIES TO THE DESIGN OF LINCS INDEXING TOOLS 



Prepared and edited by.-^ 



Jessica Perry 



Alfred Pletrzyk 



us D€PARmCI«TO^ HEALTH. 
COUCATlOl« 4 WELFARE 
I«ATI0NAL ll«STirUT£ 0^ 
£OUCaTIOI« 

THIS DOCUMENT HAS 8€e« rCPHO. 

DuC€o cxactlv as fteccivio from 

TH€ PC^SONOR ORGANlZATrO** ORIGIN. 
ATING IT POINTS OP Vl€w OR OPINIONS 
STATCO DO NOT NECESSARILY ft€l>R€* 
S€NT OF FrClAL NATIONAL INSTITUTE OF 
COuCaTiON position op PCLiCV 



LtNCS PROjECT DOCUMENT SERIES / NATIONAL SCIENCE FOUNDATION GRANT 
CAU,IHCS-69-15 July 1971 NSF GN-77.1 

CENTER FOR AI'fUEO LINGUISTICS, 1717 MASSACHUSETTS AVENUE, NAV., WASHINGTON, O.C. 2(K)3& 



PRELIMINARIES TO THE DESIGN OF LINCS INDEXING TOOLS 



Prepared and edited by 

Jessica Perry 
Alfred Pietrzyk 



ERIC 



CONTENTS 

Note iii 
Chapters 

1, Indexing Tools for the Language Sciences: Methodology^ 

by Jessica Perry 1 

2, Vocabulary and Indexing for LINCS: Some Preliminary 

Considerations J by F,W. Lancaster 14 

3* A Preliminary Classification for Language Sciences Infor- 
6- mation; Working Outline^ by Fred Bauman 22 

4. Vocabulary Control for the LINCS Reference Management 
System (RMS)^ by Alfred Pietrzyk 33 



4 

ii 



NOTE 



The four chapters included in this report are base'i on LINCS project 
activities undertaken since 1963 ^^th an emphasis on indexing tools 
in the language sciences and related problems^ some of ^t?hich are also 
treated in the follovjing documents of the project series;* 

Le^jlSj Kathleen P, j comp. Indexing tools and terroinolo^v sources 
in the IgnRuage sciences: A bibliographical listing . LINGS #2-68j 
HSF GN-653, Washingtonj D.C.; Center ror Applied LinguisticSj 
1968j 20 p. (ERIC: ED 02i 245), 

Pietrsykj Alfred; Lambertjjj Frances; Freeman^ Robert R. Fi le 7 ^ 
management techniques and systems ^jith applications to information 
retrieval! A selective bibliography . LINCS #3-63^ NSF GM-653* 
Washington^ D*C. ; Center for Applied Linguistics^ 1968j 27p. 
(CFETI: PB 178 792). 

^^-^ 

Rosenfeldj Samuel A,; Sable^ Jerome. Requirements for LIHCS file 
ma nagement system . LINCS #8-69j NSF GN* 771. Washington^ D*C.: 
Center for Applied LinguisticSj 1969j paged by section. (CFSTI; 
PB 186 472). 

Rappaportj Miriam, C itation patterns in selected core journals for 
linguistics . LINCS #13-69j NSF GN-771* Washington^ D.C*; Center 
for Applied Linguistics^ 1971j iiij 23 p. 

Garvinj Paul L, Specialty trends in the languaf^e sciences . LINCS 
#16-69j NSF GN-771* Washington^ D.C. ; Center for AppUed LinguisricSj 
1969j iij 29 p. (ERIC: ED 034 983)* 

Ebersolej Joseph L. Some probable technologj^cal trends and their 
impact on an information network system * LINCS #3-70j NSF GN-771* 
Washington^ D*C. ; Center for Applied Linguistics^ 1970j 13 p, 
(CFSTI; PB 192 494)* 

Zisaj Charles A* LanRuage classification and indexing. Within 
annotated bibliORrapby * LINCS #5-70j NSF GN-771. Washington^ D*C*: 
Center for Applied Linguistics^ 1970j 21 p. 

Giffordj Carolyn, A survey of ind ex ing tools for the language 
sciences. CALLINCS-70-6 ^ NSF GN-^771. Washington^ D*C*: Center 
for Applied Linguistics^ 1971* 

Rosej Priscilla* Linguistic bibliography count * LINCS ^^10-70 Pj 
NSF GN-771* Washington^ D*C*: Center for Applied Linguistics^ 1971* 



'^Documents- marked' **P*' are preliminary uorking papers for limited circula- 
tion only* 

iii 



ERLC 



Chapter 1 



XMDEXIHG T00I5 FOR THE LANGUAGE SCIElfCES: METHODOLOGY 
By Jessica Fferry 



!• Introduction 

With the support of the National Science Fouiulatlont the Center for 
Applied LttJgulstics <CAL) has undertaken the responsibility of attending 
to develop a viable Information network to serve the users of language 
Information* 

Tifo questions liomedlately arise: 1) who are the users of language In* 
formation^ and 2) what Is language Information? 

The first queatlon cannot be answered definitively^ of course^ until 
there Is soote sort of Information service for those who have serious In* 
formation needs In the language sciences to use« One thlng> however^ 
seems to be clear: the users of the evolving Language Information Network 
and Clearinghouse System (LINGS) will not all be llngulstsi Many will - 
be persons fr<m outside the core discipline of linguistics who need 
linguistic Information in connection with problems In other fields* At 
the same time LXtKS will aim to serve the linguist effectively by giving 
him Various information products that he now lacks or that are scattered 
among a wide variety of information sources of uneven quality and timeli- 
ness (see Part II of Freeman> Pietrzyk and Roberts [4])« These tTfo uses 
of linguistic information are discussed In detail by Paul Garvin In 
Special Trends in the Languaae Sciences l5li 

In this same report Garvin also discusses the question of the scope of 
linguistic information to be included in LINCS> and his chsrt on page 22 
shows at a glance the relationships between "linguistics*' and other fields 
as revealed In recent literature* The question of the scope of language 
information is obviously the '*other side of the coin" of the question o£ 
the users> and the chart as well as the discussion by Garvin and others 
can serve as a guideline for the orderly growth and coverage of LINCS« 

Uaing the guidelines given by Garvin find others as the conceptual ' frame* 
work of LltlCS^ we next have a series of problems connected with establish- 
ing a prototype LINCS In order to test the viability of the concept* 
These problems involve both operational and philosophical conaiderations 
such as the following: 1) Hw do we develop criteria for selection of 
input to LINCS> and how do we set up workable operational acquisition 
procedures for individual documents? 2) What will be the optimum index* 
ing language to enable users with a wide variety of information needs to 
find relevant doctiments within the files of LXNCS? 3) How will this in- 
dex language Interface with the various Indexing languages of the ongoing 
information centers with which LXNCS will be cooperating? 4) Especially 
considering the varying backgrounds of potential users of LINCS> where 



6 



to 



JO 



o 

1 

CO 




!S as 



a 

rt 

rt 

0 o 
rt 

^ a 

rt 

vf 

rt < 
rt h-* 

1 P 



O Hi C OO 

rt c *o A 
p h-* ^ rt 

rt to 

^ s ?l 

O rt ft rt 

> ^ to 

•a f-*^ OB 
•art pr 
M D* rt fi 

H- Of-* 

Q< rt *o 

0» M O 

rt rt M* 

( TO O rt 
rt m 



» O - (b D 
fii » B O- 

rt p rt 
>t rt H 
< >- 

O H fi> o» 
rt rt _ 

» H- O* H- 

rt p 

O O O 0 

C « rt 

nr o O 

on o 

H- <» rt ft 

rt H- p 

m rt ft 

O 

0» fi> O 

o M m 

? 5 



rt "3 ^* 
apt-* 
rt rt h-* 
A i:r* 
H rt H- 
n rt 



j-h ra rt 

O O H- :r 

h| h-* O rt 

C 0 

fi> rt rt 

h-* O rt fi> — 

P « H H- H- 

H- 0» < rt OH) 
P • 

ii» o rt 

o ^, (& 5* 

: 

PA a 

rt H M & 

o no 

P fi> £^ M 

o» rt B 

rt c 

M P ^ 

< O O 01 

S S O 

» j«* H P 

rr a* 



O B) P 

9 rt 

COP 

to 

rt o 
o» "a 
rt H- 
rt rt 

fa 

rt »\ 

to OQ 

7g 

H 

rt Q. 

*a o 
o o 

to o 

to to 
c 

o rt 
sr o 

to H- 
to rt 

< 

to c 
M to 

S" 

O H 

C to 

to *<0 



a* to o c 
H: rt c to 
o* B H rt 

H- d rt to 
o ii* P 
OQ rt rt 

to o to ^ 
rt to rt 

totopt^_ ^ ^ ^ 
rt rt H 3 trh tJ' < 
" ^ B ro A rt 




H» M to 

rt S S 

Q. h-* h-* 

t v: 

rt n 



H» rt 




PM H-p^rtrtQ-rtpH- rtw 
to rt u rt £^ P*^* d p. rt H- H- 
^rtrtrttoOA rtNto^Bto 
3fH-rt toQ-rtirtrti*S 
toOSrthnrt*^»rtH"Q-rtH* rt 

g^s^ g ao - 

H> to ^ Si ' *5 rt 
OH^rttofA o QmX < 

f&ArttoH-h-*n rto 
n5a grtrt 
^rto»tortrt rtBH 
H* H-OOH-Q-P^rtSO 
i-irtBArt_rtd H 

h-* D**0 H- O* rt to 



H- to 

O C 

_ rt ^ to a 

cr to o* 



A O to to 

O h-* h-* rt h-* 

rt rt rt h-* 

H A _ 



3^ 

to H- 
rt rt 

O to 
P rt 
rt 



a . . - 

rt to 

§to H- 

rt rt 
o 

rt to rt 

o 32 rt IJ* 

•a rt < 

H- h-* rt rt 

to *6 to 

H- rt rt 

O rt to 

to H 

H- rt o 

o to 

C to A A 



3 to 
& 

P to 
rt »o 
rt 
o 



»^ to 

rt rt 
o 



P H- H- H 

rt-to to n 

A t« OQ 
to 01 




rt M to rt 
rt rt rt to 

£^ O to to 

rt o rt H 

to fi 

H rt to SJ* 

rt to ^ H 

to to to to 

C d H rt o 

rt rJ* rt rt 
to ^ to_ 

ort 8*^5" o^*;_ 
H>a a?^ Mi^-^^p'ss 

P*OrtSSH*0 
to M'O ^ . 

to 2 rt''3 B •S «r 

AMrtH Q-H*f'HO'_ 
tof^SJ* H-^rtS-AO* 

usTrttoP^^ rt 

OH* too* to 

•^aOQ o n rt rt h-* 
H-rJ'^trQ- A«rt 
a* o ^ to ^* to Q< 

A t! to rt M* 

OH^BtorttoOHk 



sr rt H- o 
H- rt < 

rt 3 »-* - 
_ O M rt 
to £ CO o 

*5 ^ a C 

H* I* 5*^ tJ 
• ' d A S to 

O to rt h-* CO to 

d* C to 

o rt rt d* H- 

H» P to d to 

p c H n* H- 

to hi to to rt A ~ 

H- rt to M 

rt 'S ST d* rt 



H- ^ rt 
to rt * 
to H- d^^» 

if" 

E* M to 

S ^ to & pr^ , . , _ 
rt^, M rt rt o to rt rt 

a d* d* v: c d*^ to 

O O O <C M. h-t 
rt ^ rt £^ M 

« « 0, 

to 



rt 

O to 

H- H O 
to A M 
H- 

Q 



A 



P 

A 3 J* rt 

to Q< *o ^ h-* 

£ 9- ^ d* rt 
2. rt pP rt 
ti d* to r» 
« rt d* 0» rt 
rt rt rt 

O P* to 

Hi rt M* P 
t to 
I 



ERIC 



1) preparation of a small sampla of "core" Index terms, 

2) Indexing a small sample of "core" documents by means of these 
terms* 

3) searches of the sample by a carefully selected group of 
representative users* 

4) evaluation* refinement* and extension of the Index language* 
the Indexed file* and the user population followed by further 

searches and evaluation* 

1 
1 
1 

3* ilHCS Thesaurus ^ | 

3 

In vlev; of the following considerations* the thesaitrus was chosen as \ 
the most advantageous Index language for ilNCS: | 

3 
I 

1) UNCS will ultimately have a very large file* The ilKCS ] 
netT7orkwlll create access to the entire world*s production | 
of language information* The thesaurus with its ease of up- 
dating and "adding index terms and relationships is an ideal 

IndeK tool for very large files* 

2) The scope of ilNCS will be highly interdisciplinary^ as 
the chart in Carvings report suggest s« The thesaurus can 

be structured to accommodate many points of view slmul* i 
taneously* j 

3) ilHCS will provide the necessary switching devices to I 
interface with a variety of other information processing | 
centers around the vorld^ so that searches can be conducted \ 
for specific information Indexed by different indexing | 
languages* The thesaurus offers the requisite flexibility | 
tc switch from one index language to another* 

4) Many of the contributors to Information in the Language 
Sciences [4] as well as Garvin [5}^ have alluded to the 
fact that language information^ and especially "linguistics" 
is an emerging field with many schools and points of viey^ 
all of which must be accommodated in the information language 
of ilHCS* ilHCS obviously cannot be parochial* The thesaurus 

is uniquely able to structure index terminology in a "non- I 

partisan" manner, to provide various hierarchical arrangements 

and cross references reflecting various views and taxonomies 

of the field* This capability is doubtless tha single most 

important factor in our choice of the thesaurus as the in* 

dexing language for ilHCS* 



8 



ERIC 



5) With the anticipated variety of users and uses for the 
Infortaatlon In the flies of the LINCS network^ a full 
range of generic to specific search capabilities must 
be provided by the Index laog:uage* The Index language 
must also prcn^de^ as far as 16 possible^ for those un- 
anticipated searches that will undoubtedly rcault from 
the Interdisciplinary ol* Mission" oriented'* use of ZJNCS* 
These requirements dictate the choice of: 1) an lnde:dng 
language that can be post^coordinated at the time of the 
search^ 2) an Indexing language structured so that the 
term relationships are made evident to both inducer and 
searcher* The thesaurus vlU be designed to accomplish 
both of these tasks* 

6) Although the ultimate configuration of the LINCS network 
cannot be precisely known at the outaet^ an indexing 
language that is easy to uAe^ both for Indexer and searcher^ 
must be provided^ especially since it is possible that the 
input and searching processes will be performed at more 
than one location* An adequatialy documented thesaurus will 
be uned in the same way hy widely scattered Indexers and 
searchers* 

7) During the past decade, of all indexing, tools, the 
structured controlled vocabulary known as the thesaurus 
has received the mo3t sustained attention by information 

^ ^ scientists* Its intellectual and pl^alcal structure has 

been the object of an enormous amount of effort culminating, 
perhaps, in the monumental Thesaurus of Englneerlpg and 
Scientific Terms (lEST) of Project lEK [2]* 

TEST contains a very large and growing Indexing vocabulary* 
It is designed to be used in an automated Infoma'Jlon re** 
trle\'al system tghere all of the file searching and much of 
the thesaurus construction and maintenance is done by com** 
puter^ Very Pophisticated software has been developed for 
these purposes and is available to Li£?OS for experimentation. 

The i:heaaurus offers the flescibility of structure and main- 
tenance, the semantic controls and the cross ref^rencipg^ de* 
vices required for the LIHCS Indexing language by the 
considerations enumerated above* 

4* CongtriKticn of Tr:lal Thogacrun: Sources of Vocnbulary 

It is obvious that to be useful indexing must reflect the search needs of 
the user* Ideally, then, it might be proposed that an information storage 
and retrieval system should begin with the identification of its users, 
followed by the submission and collection of their own terminology for the 
indexing language* However^ neither time nor money has permitted this 
puri.st approach in the past* Furthermore, we sincerely 'believe that 



sufficient guideltxied ar« nov aTailable as to potential uflora and p^ten-^ 
tial acope of UKCS for us to begin preparation and pilot testing of 
a tdCS tbeaaurus. Coaddentloua eraluatlOQ atudlea of the eystem will 
ansure that the lododng tool of tbe operational IJEIC8 vltl reflect the 
vocabulexy and search objectives of its users* Hcoce^ va are beginning 
to build the trial thesaurus using selections froa the vocabualry sources 
described in Lewis [7} and Gifford [6}^ many of iMch hsm Indeed been 
used to Inden the literature of linguistics. 

5, Guidelines for Thesaurus Constructtjaa 

A thesaurus is a controlled vocabulary to guide indexers and users. 
Xte function Is to bring the language of the author into eolnddence 
with the language of the user who will be searching for inforstttlon 
at sooe later tlQ»e« Bow the thesaurus performs this function to a 
large extent determines the success or failure of an Infomatlon re* 
trieval systenw It Is therefore of the utmost isaportsi^ that thesaurus 
siafcers komr what they are dolng^ and that they lay down guldaUnea for 
processing tenos^ so that all decisions can be made consistently and 
In accordance loith the purpose o£ the thesaurus. 

Guidelines for thesaurus construction isust deal with a ylde range of 
probltacns from the most Intensely philosophical to the purely mechanical^ 
In drawing up the guidelines for constructing the LZNCS Thesaurus we have 
used as a basis the VS A ^tandj a rd Saiitc Criteria for Ittgexes (OSASI - 
Standard) and the Guidelines for the Developmmt of Infomatioa Retrieval 
Thfetsauri [1], prepared by the Committee on Scientific and Technical In- 
formation (COSyaX). Our tcethodology ylll be that used Project I£S to 
construct TEST. We are extremely indebted to such persons as E^g^e 
Wall and others viio have covered the came ground previously and left ex* 
pllnlt Instruction for thesaurus construction* All that rcskalned for 
us to do vas to adapt proven guidelines and methodology to the particular 
characteristics of linguistics. 

6. Guidelines for LIMCS Thesaurus Construction 

Using the same forms "developed for It^ut^f Terminology to TEST^ we hove 
prepared a sac^le oi "core" terms displayed In thesaural relation^ps 
hy the AUTO-USX The^jaurus Construction md Maintenance Prograos. An 
^Kcerpt of this thesaurua is displayed In Chapter 4« 

In order to use this form^ tne tINCS staff had to make decisions with 
specific reference to the language sciences on all the points listed 
In the COS^X Guidelines for thesaurus construction^ as well as on acme 
points not listed^ but found from experience to be liii^ortant« As was 
noted above^ these dedniFxns are a mixture of Intellectual and clerical 
points* The Hat of points and the decisions made for the construction 
of the saaq;^le UNCS Thesaurus which will evolve Into firm guidelines 
for IXHCS are as follows* 



5 

10 



1) Thesaurus Introductloa 

Vo Introduction for the benefit of Inducers ^d users has 
been written for this sample LZNCS Theaaurus whose purpose 
l8 mainly to test the thesaurus programs and to display an 
array of linguiatic terms in a thescural structure* 

2) Term Selection 

The terms for the initall thesaural display vere selected 
intuitively by lingtilsts from available lists of indexing 
and vocabulary terms without specific data on their anticipated 
frequency in indexing or sear<diing* They are all acceptable 
or authentic linguistic terms* Their relationships to other 
terms in the LINCS vocabulary is expected to change jsomewhat 
a£ter more candidate terms are' examined and after controlled 
indexing and searching experiments have been conducted* 

3) Horn Form 

The noun form of selected terms will be used in all instances 
where reasonable^ For example, ^en we encounter the term 
parse * we shall enter it ao the gerund, parsing * 

4) Singular vs Plural 

Although we have not done so in the sample thesaurus, we 
would probably be iiell advised to adhere to the rule of 
using plurals wherever possible* This rule would prevent 
the noun*verb ambiguity inherent la a term such as affix* 

5) Term Ambiguity 

We have tentatively attempted to clarify ambiguous terms by 
the use of Parenthetical <iuali^rlns expressions, e*g*. 
Phonetics (acoustic). Phonetics (a rticulatory ) * and Phonetics 
(auditory )^ Eovaver, it is not yet resolved whether we shall 
ultimately clarify these kinds of ambiguity quailing 
notes la parentheses or listing them as precoordinated 
terms, i#e*, acoustic phonetics ^ articulatory phonetics^ 
and auditory phonetics^ In other ingtaoees we have freely . 
included compound terms, such as anthropological lingutstics^ • 
Specific guidelines for the use of one or the other, or both 
of these devices will be developed as more experience is 
gained* 

6) Direct vs Inverted Entry 

All terms except those in the language*naBe list are entered 
in LIKCS directly without inversion, e*g*, comparative linguts* * 
tics, not linguistics, comparative * Whether or'ncv uniform 



ERiC 



6 

11 



guidelines should be established on tbls point has yet 
to be conaldered« 



7} SynonTUS 

Wh^ tvo or more teres have appeared to be eynonyntoos, 
ve bave selected o^ie as the preferred term aoA entered 
the second as a USE reference, e.g^ Ungulgtli^ gnthror>ology 
VSF^ anthropological HnRulstlcs* 

8} Punctuation 

Escept In the Inversion of language nomefi (Germanic, 
Western) punctuation has been avoided la the sample 
LINCS Thesaurus* 

9} Abbreviated Word Forma 

In the pilot vocabulary aample ve have not encountered 
abbreviated word forma or acrouyos, but we anticipate 
avoiding their use* For example, ve shall use the term 
mg,chlne trenslatlpnp not IE* 

10} Alphabetization 

The LINCS sample Thesaurus haa been alphabetized accord* 
ing to the AUXOLES sorting prograai i^bich is a (character* 
hy'-character) sort* 

Ip'^Cross References 

The types of croas references aa veil as their notatiou 
uced TEST have been used In the WNCS Thesaurus* They 
aref 

Sf pe of crooa reference Notation 

use USE 

used for CF 

broader term TSZ 

narrower term NT 

related term BT 

In the structured lisCing the main entry teroa are dli* 
played In alphabetical order doun the left-hand colxtm 
and the cross references are printed out beneath thm 
indented to the right* The use of these cross references 
in the LINCS Thesaurus is ns follovot 



O 1 o 

ERLC '^^ 



Use (USE) References 



The USE reference Uads the user of the 
thesaurus from a term that may be a valid 
term to the searcher or IndeKer to the term 
that Is preferred by the thesaurus* It will 
be noted In this example that of the two 
terms historical linguistics and diachronlc 
li nguiatics . both of which are 'Valid/' in 
the general sense^ the LZNCS Thesaurus does 
not consider the latter a search term* There** 
fore the indexor and the searcher are given 
access to the thesaurus through both terms but 
are directed to use historical linguistics as 
their indexing or search term, USE is not 
optional^ it is a directive* The term diachrooic 
linguistics is not a LINGS term* It is antici- 
pated that the USE reference will be very useful 
in the switching from one indexing vocabulary to 
another in the LIHCS network. 

It should be mentioned that the USE referencej 
while it may be used to indicate preference of 
one synonjn^i over the other is not necessarily 
restricted to **pure*' synonyms ^ but is used for 
those terms which are considered synonymous for 
indexing and retrieval purposes* 

The USE reference may. also be Incorporated in 
the language name part of the LIHCS Thesaurus 
to lead the user^ for example^ from the more 
"hierarchically logical" Norse . 014 to the 
operationally preferred term^ Old Norse * 

Used For (UP) References 

The UF reference Is the reciprocal of the USE 
reference and performs the same function of 
directing the user to the preferred LIHCS 
term* For example^ referring again to the 
terms discussed above^ directly under the main 
entry historical linguistics is the entry UF 
diachrooic linguistics , whereas directly under the 
main entry diachronic linguistics is the 
directivGj USE historical linguistics * 

The criterion for the selection of a USE or \JF 
reference can in actual practice be almost ar** 
bitrary* They are both simply devices to control 
the index terminology of the thesaurus so as to 



6 

13 



ensure that Indexer and searcher etitry ^ 
vocabularies when they mean the same thing 
T^ll be changed so that th^ can match lu 
the search of the LINCS file* 

Narrower Term 0^) and Broader tern (BE) 
kefe^encf^s 

These two references vere developed to 

slgolj^ class Inclusion relationships. 

Narrower terms are Included In the isean^ 

Ings of b^roader terms^ and broader tems 

IncltEde thb meanings of oarrower tems« 

It was In the att^pt to develop ^Epllclt 

guidelines for the assignment of tfaeoe 

references that the essential difference 

between linguistics and the *1iard** sciencea 

for which these references were originally 

develcped becoiEie most apparent*^ the rule of 

thtimb for thesaurus construction In the hard 

sclencas Is that a narrower term **l8 a*' 

[mexber of the class] broader term* For ex* 

ample^ s^^eels jgire Iron alloy s would he designated by 

steels 

EE Iron allqys 

lra3 alloys 
tn! st3el& 

However^ linguistics is not a hard science* 
Its aspects partake of both humanities and 
the sciences^ social and natural* Since it 
faces both ways^ so to speak this seemingly 
slmp?,e test for the tni^BE relationship la not 
feasible for the term arrangement of LZNCS^ 
except for some few terms denoting physical 
objects* On the other hand^ tlie usual sub- 
jective way of arranglog terms Into a hierarchy 
wlilch is usually expressed hy *'comes undei** does 
not seem to be a proper criterion In the construe** 
tlon of a thesaurus* It muld inevitably lead to 
the kinds of inconsistencies that make traditional 
library schenics so subject to criticism de^ite 
their attempts to adhere to principles of sub- 
divlsioiu Tet USCS Ernst develop a rule of thumb 
for consistent BT^Ht relationships* The criterion 
which has been used to structure the terms of the 
sample thesaurus into BT-MT relationships Is *'if 
you were conducting a search for infocmatlon induced 
by the broader tcrm^ would you always waiit infor- 
mation indexed the narrower term7*' Ibis criterion 



9 

14 



is explicitly user^orleiited and c«n only 
be validated the users of LINCS* Later 
evaluation studies of UNCS will prove idiether 
this guideline is viable* Of course, for the 
coQStructioQ of the sample theaaurus^ CAL was 
acting as user and Inducer* The usefulnesa of 
this guideline can be illustrated by 

allophone 
BT phoneme 

and 

phoaeme 

NI allophone 

vblch Ifl to say that in a search for Inforaation 
on phonemes the user vould always want information 
on allophouM^ but' not'necessariiy vice-veraa^, 
because of an Impoirtant policy cf coordinate in**^ 
deling, i*e*^ the indeacer al«^s assigns the 
most specific index term available* Thus vhile 
a user, searching for information on pbooemes would 
al^»ays want to see information on allophones^ the 
user searching for specific infomation on allophones 
wout^not necessarily be interested in information 
about phonies in general^ or In any other aspect 
of phonectes* The search program provides for either 
kind of search* 

The Importance of using clear*cut^ workable guide- 
lines for indicating term relationships can not be 
over-emphasised* As the UNCS Thesaurus grows in 
B±Z0 these guidelines will become more critical* 
If they are carefully developed and prove to be 
operationally feasible^ they will ensure consistency 
of structure when terma are added to the LISCS 
Thesaurus which will in turn ensure consistency 
of search results* 

Related Term (BX) References 

The RE reference Is used to refer from index 
terms to other index texma which are related^ 
but not hlerarchlcallyj^ i*e*> that are neither 
breeder nor narrower* Since in the final 
analysis^ every term in the file is related in 
some way^ extreme caution should be exercised In 
assigning the BX reference* The fact thbt terns 
are indeed related In some unspecified way is not 
sufficient reason to indicate the RT relationship* 
The guideline for assigning this relationahlp 



10 

15 



should bes "Vfould the user appreciate being 
r^idnded that the related term is available 
for searching?*' We have used the fix reference 
sparingly In the LUTCS sample thesaurus^ and are 
not 8ure that It will be useful \Aiere have 
used It^ as^ for example^ 

comparative linguistics 

fix descriptive linguistics 

The AUIO-LEX programs give the option of In- 
dicating the reciprocal of this relationship 
or suppressing It* 

The Role of the LIMCS Thesaurus In the XJUCS Netyorfc 

The LINCS Thesaurus promises to be particularly useful as a 
switching device In the IWOS network* If properly constructed 
It can be used to translate the various Index languages used by 
the various centers comprising the network Into the Index language 
of Lines* This capability becomes^ particularly Inqiortant vhen one 
considers the bulk of material on the subject of the language 
sciences that is Indexed In countries other than the United States, 
to tvhlch the LINCS network will give access* As an example of how 
the switching process might work between LXNCS and a documentation 
center overseas using the Universal Decimal Classification to Inoex 
language related Information^ we refer to the dlscusslon..b^ Robert 
Freeman on the subject* Freeman describes several potential solu" 
tlons to the problem of gaining access to documents written in a 
foreign language to show the effectiveness of UDC to surmount Ian** 
guage barriers: 

A third solution^ \Ailch Is attractive despite 
the greater effort which would be required to 
Implement it^ xjrould be to permit indesdng and 
searching to be done using a controlled natural-^ 
language vocabulary of local choice* A part of 
the system would then be a table of equivalences 
between the UDC and the natural language vo- 
cabulary* The result would be to take advantage 
of the hierarchical notation of the WC without 
even requiring that the user be familiar with 
the UDC* In addition^ since the UDC would be 
the internal form of Indexing^ users in any 
center could direct queries to the flle^ without 
regard to the original language in which the 
ind^ng waa done* [3] 



11 



16 



In Freeman's "third solution" the English "table of equivalences 
between the UBC and the natural* language vocabulary** could be 
incorporated Into the LINCS Thesaurus In such a way that UDC numbers 
could be constructed by people^ or possibly by computer^ and 
searched by matching tHe request translated Into UTX: st the centers 
where UDC Is used^ thus avoiding the necessity of phrasing the 
query In a foreign language^ or conv^rsely^ knowing the classlfl^ 
cation.' 

8. UMCS Mtcrothesaurl 

As in any large information network where various ntember centers 
process specialized information^ individual centers in the LIHCS 
network will require more specific index terminology than will be 
useful .for central LINCS* For these centers subsets of the LXNCS 
Thesaurus can be extracted and used as A basis for more d^^talled 
lalcrothesaurl which wilt permit the specialized centers to index 
any desired specificity. These mlcrothesauri will in turn be in** 
put to the internal LIHCS Thesaurus so as to be available to all 
LIHCS indexers should they need the specialized terms. We are 
tentatively planning to use the entire thesaurus including the 
microthesaurl in the LINCS system to act as ^ internal devise 
to enable the indexing language to be controlled and standardized* 
The following tree is an illustratioh of how the mlcrothesauri 
may be used as an internal control. Upper case letters represent 
terms in the'^'oijen" LXKCS Thesaurus. Lower case letters represent 
terms in various mlcrothesauri, 

A 

BCD 

f S hij E 

f h fc 

Ncte that terms f and h are placed in two separate hierarchical 
arrangmcntd* . With such a structure used internally^ searches can 
be toade for the specific terms £ and h regardless of their hier* 
archical arrang^ent* At a more generic level^ say B or questions 
can be nogotiated to give the user the option of either hierarchy* 
Thla concept would give IJHCS the flexibility it must have to 
various collections of language^related information indexed ac^ 
cording to different points of view and taxonomies* 



12 



17 



References 



[1] CooaaJlttGe on Scientific and Technical Information (COSASl). 
Guidelines far the development of informational retrieval 
thesauri . Washington, D*G*: Government Printing Office^ 1S57* 



[2] D^artment of Defense, Project Lex, Thesaurus of Engineering and 
Scientific Terms (TESa:) , Washington, D.C.: Government Printing 
Office, 



[3] Freeman, Robert "Actual and potential role of the Universal 
Decimal Classification," In: Robert R* Freeman, Alfred Pietrzyk, 
and A* Hood Roberts, eds, Bifoimation in the language aciences* 
Proceedings of the conference" he Id at Airlte Hou^eT^Warrenton, 
Virginia^ Karoh 4*6. 1966^ under the sponsorship of the Center 
_for Applied Linguistics * Mathmatical linguistics and automatic 
language processing 5« New Yorkx American Elsevier, 1968, 
149-163, 

[4] Freeman, Robert R,; Pietrsyk, Alfred; Roberts, A. Hood, eds. 
Information in the lang^ia^e scie n ces* Procee d ings of th e 
conference held at ^jy ylle House, Tfarrenton. Virginia, M^rch 
4-^6, 1966, under the cpont/orship of the Center for Applied 
linguistics* l^Iathematicdl linguistics and automatic language 
processing 5, New York: American Elsevier, 1968, xt, 247 p* 



[5] Garvin, Paul L* s pecialty trends in the language gciaices . 

LIKGS #16*69, HSF GU-771* Washington, D*C*i Center for AppUed 
Unguistics, 1969, ii, 29 p* [ERIC: ED 034 983], 



16] Gifford, Carolyn* A survey of in de xing tools for the lan^aj^e 
sciences , CAIXIKGS *70P> NSP GN-771* Washington, D*C,: Center 
for Applied Linguistics, 1971* 



[7] Lewis, Kathleen P*, comp. Indexing topis and terminology sources 
in the language sciences; A bibliographical listing* LIKCS #2-68, 
NSF G!l^653« Washington, D«C«s Center for Applied Linguistics, 
1968, 20 p, [ERIC: ED 021 245]. 



13 

18 



chapter 2 



VOCABUL&RY AND BSEXIUG FOR LINCS: SOME FBELIHIIIARY COHSIDEBAIIONS 
B7 Lancaster 

1« Requirements 

The choice of Indexing procedures and Index language for LIMCS will be 
dictated by: 1) the prod^ucts and services to be provided^ pud 2) the 
organizational characteristics of LINGS Itself* 

IXNCS will be a multipurpose system^ generating a number of different 
products and services* Such products will probably Include published 
Indexes and abstracting journals^ other current awareness devices In*- 
eluding some form of SDl (on a groop or Individual basls)^ and retro- 
spective search capabilities* It Is important that the Indexing and 
Index language adopted should be capable of generating all o£ these 
products* That Is^ from a single Input operation we must create an 
Indexed data base from which all bibliographic services can be pro* 
duced without further Indexing modification^ We do not - want to Index 

one method for one service and a different method for another* 
Nor do we want the cotxq;> Ilea t Ion of having to produce coo^lex algorithms 
to translate from one vocabulary to another (e*g*^ from a classlfL* 
cation scheme to subject headings)* 

It Is expected that LINCS will consist of a network (loosely struc* 
tured) of Informatloi: centers In the.. language, sciences ulth bol:h 
primary and secondary nodes* At the present time we expect that 
many of the operations of LINCS will be largely decentralized (as 
they are^ for example^ In the MEDLARS and ERIC networks)* We expect 
to receive Inputs (in the form of Index records and/or abstracts) 
frouL several of these network components* The 'IJNCS Central*' will 
be largely a network man^ement center with responsibilities for 
policy^ coordination^ review^ publication^ quality control and net* 
work switching activities* Because network participation Is likely 
to Involve voluntary cooperative arrangooents^ Indexing procedures 
are best kept relatively simple* His would Uke to avoid highly com- 
plex Indexing methods or highly sophisticated Indexing languages If 
the application of these would put an excessive burden on partlcl* 
patlng centers and thus tend to discourse full cooperation* More* 
over^ the LINCS network will Incorporate Information centers already 
In existence* Some of these components already produce doctmient 
surrogates^ of one type or another^ for their own purposes* We 
would like to avoid duplication of effort by making use of these 
surrogates^ Intact or with minor modification^ In the LINCS network 
as a whole* If necessary^ we would want to convert from the vocab* 
ulary of an existing center (automatically or seml-automatlcally — 
for example^ vocabulary conversion tables to allow mapping oper*' 
atlona) to the vocabulary of LINCS^ thereby allowing the center to 

14 



ERIC 



19 



continue to index to meet its am specialised requirements but^ at the 
same time^ to be providirg input compatible ^d.th LXNCS requirements. 

2, Vocabulary Alternatives 

The follocd.ng possible vocabulary approaches exist for consideration: 



1) A carefully controlled^ highly stiuctureo vocabulary in 
the form of a thesaurus^ list of subject headings or 
classification scheme. ' ^ 



2) Free assignment of keywords or key phrases by indexers^ as 
for example^ in the technique of title expansion. Free use 
of keywords would per]^aps be coupled with the use of some 
broad codes for subjects^ countries^ languages^ etc, 

3) Natural language searching and processii^g of abstracts^ 
extracts or other document representation in machine* 
readable form. 

4) Machine extractio n of ke3W)rds or phrases. 

5) Maciiine assif^Dj^ent of descriptors selected from a controlled 
vocabulary. 



Before discussing the pros and cons of these various approaches^ let us 
consider some general trends in vocabulary usage for information re** 
trieval at the present time. It appears clearly than there is a general 
move toward simplicity in the ei^ploitation of information retrieval sys*- 
terns. Such complexities as role Indicators and similar syntactic devices 
are disappearing or are used very sparingly. The approach of natural**' . 
languegc searching^ idlth comparatively little vocabulary control^ is more 
popular now than previously. There arc several reasons; 

1) Experiments and operational experience have shown that 
rat urcl** language systems can be made to work effectively. 

2) Machine-* readable corpora^ by*-products of photocomposition 

or various other kc^b^arding operations^ are becoming ^d.dely 
available. 

3) Natural*- language searchins is more attractive for on*-line 
imp lemen Cation than it was for batch*-processing systems. 

Various government agencies provide examples of the move toTfard simpll** 
fixation. Several years ago, certain major information systems utilised 
a highly cophisticcted incexiJig, requiring skilled indexers and based 
upon a detailed classification scheme* Now these information systems 
n£!C a relatively shallow indexing^ based in some cases upon geographic 
oodesj A broad and much abbreviated subject code (about 250 classes) 



15 



20 



and uncontrolled Iceyxjords extracted from document titles or added to 
document titles. Indexing ia nou conducted by personnel with no more 
than a high school education^ some' of whom can index 100-125 documents 
per (lay. Indexing costs have thus been reduced dramatically. Further 
justification for systems based on some form of natural** language indexing 
and searching is provided by the following evidence; 

1) In the comparison of index language devices conducted by the 
ASLIB Cranfield Project, it was clearly demonstrated that optimum 
retrieval results were achieved *ri-th natural-^ language and the 
single device of term ^coordination. Only synorona control, and 
the confounding of word ending^^, ^improved on single-term natural- 
language searching. The more highly controlled "conceptual*' 
index languages were out-performed by natural- language, singles- 
term searching [1], 

2) Salton, working iglth small experimental collections in several sub- 
j^^ct- -fields, has consistently produced acceptable results b^ fully 
automatic methods. In particular, the SMAKT -MEDLARS comparison 
suggests that automatic Information systems, based on searching 

of natural-language abstracts, may now be able to perform as 
well as present-generation mechanized systems based on humanly- 
assigned indeX' terms [3]. Salton's best results have usually 
been obtained with the lass sophisticated of his search options. 

3) Although no national information c/anter haa set up a retrieval 
system of, this type, operating information systems based on 
natural- language do e^^ist and have been shown to function ef- 
fectively. Perhaps tho most notable of these is the legal re- 
trieval system established by Horty at the University of Pitts** 
burgh [4]. These retrieval functions have now been taken over 
by the Aspen Systems Corporation. 

Increased impetus to natural- language retrieval methods is given by the 
present availability of program packages for natural'^ language processing, 
including the IBM Document Processing System, which has been adopted by 
several large organizations. 

With the foregoing background behind us, let us now consider the ap- 
propriateness of the various vocabulary alternatives for LINCS require*- 
ment5;. 

Alternative 1 is perhaps the safest approach. Most large information 
services do make use of a structured, carefully controlled vocabulary. 
Such a vocabulary, in the form of a thesaurus or list of subject-head- 
ings, is capable of being uc^ed to produce the range of products planned 
for LIIICS. I^HDLARS, for example, uses a controlled vocabulary of this 
type and, from a single indexing operation, is able 'to produce printed 
indexes, demand searches and SDI service. It is relatively easy to 
achieve vocabulary compatibility when a controlled thesaurus is used. 



ERIC 



16 

21 



Any specialized vocabularies existing In cooperating centers can become 
microthesauri tjithin the framexrartc of uhe general system thesaurus- This 
can be achieved human mapping operations, leading to the production of 
ma chine- read able conversion tables. For example, the specialized vocabulary 
of the Parkinson's Disease Information Center has heen mapped to ttB 
Ma)LARS vocabulary of HediccX Subject HeadinRS ^ Some of the mapping may 
be done automatically if e^erimente v±t\i mapping algorithms, conducted 
by Wall, prove sucessful* 

Once the vocabulary mapping has taken place, it is c>ossible for the 
specialised centor to index materials using its own vocabulary and in- 
oexing procedures but to have this indexing converted automatically to 
the vocabulary of the central system* Thu**, one indexing operation 
serves both needs* 

Another advantage of a fully controlled vocabulary is that, goierally 
speaking, it improves search efficiency, reduces the burden on the 
searcher and may obviate the need for screening of system output before 
results are deliveretl to the user* The principal disadvan^tages are that 
the usa of a controlled vocabulary (at least a large cne) xolll usually 
lead to fairly expensive indexing (because of th^ look*up operations in- 
volved) and maintenance and updating of the vocabualry i^ill also be a 
relatively expensive operation* Moreover, for efficiency, vocabulary 
control c^erations usually ncsd to be centralized* dccentraliisation can 
lead to many problems* A further poscible disadvantage for LINCS Is the 
fact that indexing using a large ^t:::,iicturcd vocabulary ia a relatively 
sophisticated operation requiring flkiHv.d indexers at the various par- 
ticipating centers- Tliese indeacsrs ^^ould naed some training and al<3o 
V'^ilC be required to follow iudexlttS ^:ules and guidelines- These factors 
may reduce center tolerance to full participation in the MKCS nettjork* 

Alternjat ly rr ,2 is an attractive possibility* This would involve an in- 
dexing process thereby an indexer would assign some relatively broad sub* 
ject codes, ;>ossibly some language or geographic codes, and several un-^ 
controlled k^f^^nijords* The keywords would probably be selected from the 
significan!; \^ris occurring, in titles plus'' additional significant Vords 
fr.om the £i>>st::act ot full ro^*t. These additloi^al words tiay, in far-t, be 
addeo by the iudcxer to the tAtXe to fcriis. ati e?:pand;3d title, ■ .Such indexlrg 
can be effected by r . :ext-inarking operation, as ir* the following example; 

(Mechanical) (Semantic Analysis) and the (Compatibility) of 
(English) (Adjectives) [(Protosyntrex HI)] 

in Y^ich each word or phr^^ee enclosed ^;ithin parentheses has been ftel<^.cted 
as a "keyword" and the expression enclosed in square brackets haa been 
added to t^e title to allorj it to be picked tsi> as an index term and also 
perhaps to clairify the title* 

t^hile the nse of uncontrolled k'^wrdo alone can lead to much semantic 
aaibijjuicy an<3^ noise, the joint uoe (as retrieval coordinates) of keywords 



i; 



22 



idlth broad subject and/or geographic codes produces a very powerful 
retrieval capability* The broad codes provide contoxt for the key- 
words and reduce ambiguities* For example: 

STRIKE associated with JOBSAN 

STEtll^E associated \Ath UKETSD ICINGDOM 

If th5 former association occurs freq[uently it probably refers to a 
military contetct, the strike fozca* If the latt^ar association occurs 
frequently it probably refers^to a labor dispute* 

The joint use of uncontrolled keywords aid broad codes frequently allowf^ 
a searcher to "zero in" on quite a small segment of a document file* 
For example, the strategy 

LAtIB (Iteyword) and AUSTRALIA and OKITED KINGDOM 

geographic codes 

will almost certainly retrieve documents relating to eicport of Iamb 
from Australia to the United Kingdom and one cannot rejadlly visualise 
cKJch irrelevancy in this search^ 

This type of system, vjith an extremely large uncontrolled keyword 
vocftbulary, is currently being used very successfully in retros 
spectlve search systems of major agencies ^-Jhose document collections 
grow at the rate of about 250,000 documents per year* Such systems 
have shown to be feasible for SDI as well* It should also be suit- 
able for LIRCS published In^leKes, the broad subject categories being 
used for publication arrangement and the keywords for subject Indexes* 

For LINCS purposes this approach offers certain definite advantages* 
The approach, which is along the lines of procedures already used to' 
produce indexes to such publications as The Finite Strin gy should find 
ready acceptance at the various LINCS centers* Indexing io cheap and 
easy to accomplish and does not require an extensive investmunt In 
training programs and m^terlaU^ The method is flexible enough to allow 
Inputs in many dlf£erei:t: forms and from many different sources* It 
\^uld be easy to Integrate inputs from LINCS Central^ LINCS Centers and 
many outside sources^ There is no reason why relevant Inputs from other 
information services (CFSTI, MEDXAKS, for example) could not be Incor- 
porated into LItTCS intact^ using the inducing terms assigned hy tli^se 
centers as "keywords" In LHJCS* 

The problem of compatibility and convertibility between centers would 
be virtually elimina£:ed if this approach were adopted* Further ad** 
vantages are: 

1) a highly specific^ dynamic vocabulary reflecting current 
usage of terminology in the language sciences ; 

18 



ERIC 



23 



2) Immediate liaplementatlon^ without \7altlng for the 
completion of a thesaurus^ and initiation of a 
training program^ 

Possible disadvantages arc: 

1) increased burden on searchers; 

2) Increased screening costs* 

It should be noted that the use of an uncontrolled keyword vocabulary 
In indexing does not necessarily mean that no vocabulary control will 
be used in searching* Usually^ some form of thesaurus or other logical 
grouping of terms be needed to assist the searcher in construction 
of efficient search strategies* 

Alternative 3 * natural* language processing of abstracts, has also been 
proved (e,g, , in SMART, in BROWSER developed by Williams of IBM) feasible 
for both retrospective search and SDI* However^ some brofid categorization 
scheme T;ouId still ne^d to be employed as the basis for organisation of 
abstracts In publications* The method Is attractive for IiINCS because a 
mechanim already exists for acquisition of abstracts, although not In 
machinable form* The production of abstracts may be more acceptable to 
centers than a formal Indexing procedure* The language of the abstracts 
would yield a highly specific, dynamic vocabaiary* Vocabulary malnten* 
nance costs need not be very high although some logical grouping of 
terms uould he required to assist the searcher and Improve search efficiency* 
Such program packages as the XBU-.Djjcumeiitc-FrPCeoGing-.Systcni i^lst'to allov 
natural language searching of this type* 

Implementation does require that abstracts be acquired for all items 
entering into the system and that these abstracts be put into machine 
readable form* However, it is likely that most LINCS publications would 
require the acquisition and keyboardlng of abstracts in any case* 

Alternative 4 » machine extraction of keywords or phrases, batr several 
of the advantages of Alternative 2« Hot^ver, all programs for machine 
extraction (e*g«, Kllngblel^s) [2] are still experim^tal and no fully 
operating aystem exists to my knowledge^ Moreover, many of the entry 
procedures for machine extraction (by statistical and/or syntactic 
criteria) have not been consplcously successful* Machine extraction 
Involves the manipulation of at least an abstract in machine- readable 
form, so that we would not avoid this input cost* 

If we go to the coot of capturing an abstract in machinable form, a term 
extraction procedure has little to commend it over free text searching 
of the complete abstract and requires much more complex and costly pro* 
gramming* This approach is definitely not recommend«sd for LIKCS at preset* 



19 



ERLC 



24 



Alternative 5 ^ machine aaslgngent of descriptors baaed on amilyela of 
natural language toxt, Is the nioat difficult to accompllah and has not 
been achieved very successfully In experiments thus far* It requires 
a machlne-^readable abstract^ progranunins complications are Increased^ and 
the resulting retrieval system has less flexibility and specificity than 
one based on searching of natural- language text* Thle alternative Is 
least attractive to LH^CS at present* 

On the basis of the above considerations It Is obvious that at least 
three alternatives appear entirely feasible for LIHCS Implementation* 
All in all, however, considering the total LINCS requirement and In the 
light of our previouc discussions on the subject, I am Inclined to favor 
Alternative 2 as being probably least expensive and most readily imple- 
mented* The adoption o£ Alternative 2 at present does not preclude the 
possibility of switching to natural-slanguage searching of abstracts at 
a later date (when the LIHCS retrieval system le on-line and fully 
operational, say) if such a switch appears desirable* Indeed, it does 
not even preclude the possibility of switching at a later time to a fully 
controlled, structured vocabulary* In fact, the keyword vocabulary as* 
sembled in the uncontrolled indexing process will provide valuable raw 
material for continued thesaurus building* For this reason I favor 
continuance of work on the thesaurus* Some type of structured vocabulary 
will later be necessary as a searching aid in any event* 



20 

25 



References 



(1] Cleverdon, C*; Keen, K Factors dctermlnlng-Jthe performance of 
Indejctng systems ^ Vol* 2* Test re suit g* Cranford, England; ASLIB 
Cranfleld Project, 1966* 

(21 KUngblel, P*H* tfe chlne-alded Indexing * Alexandria, Va*! 
Defense Documentation Ccater, June 1969* [NTIS! AD 696 200] 

[3] Sfilton, G* , A comparison betucen manual and automafcl_c ladexlng 
methods ^ Ithaca, N*lf** Cornell University, Depai^tnwnt of 
Coiopiiter Science, March 1968* 

[41 Springer, E*W*; Horty, J*T* Searching ^ collat -^nff thfe welfare 
lam of Pennsylvanla^by computer* Pittsburgh, Pa*! USnlv^raity 
of Pittsburgh, Hsalth Law Center, September 1962* [OTIS* PB 164 437] 



21 

26 



Chapter 3 



A PRELIMINARY CLASSIFICATION FOR LANGUAGE SCIENCES INFORMATION: WORKING 
OUTLINE 

1^ Fred Bauman 
1. Introduction 

There have been many classification schemes for llx^guistlcs} George 
Trager*s 1945 scheme [4] Is perhaps the raost detailed of these, although 
others such as the linguistics sections o£ the Library of Congress Classi- 
fication and the Universal Decimal ^stem are much more actively In use* 
It Is not the purpose of this outline, however, to discuss these classl-^* 
ficatlon systems; this work has already been performed by Carolyn Gifford 
In A survey of Indeadng^ tools In the language sciences [1], and adequate 
bibliographical references can be found in Kathleen P« Lewis* Indexing 
tools and terminology sources In the language sciences* a blblioeraohical 
llstlnp [2]* This outline will, rather, first briefly discuss the prag* 
matic requirements for a classification system which could be used prl* 
icarily as a framework for the thesaurus presently being prepared for the 
LINCS system, and then present e preliminary classification uhich attempts 
to meet some of these ifequirements* 

Tw Important points about LINOS iuust Zltst be made, ,becauae they. . 
Influence the kind of classification system needed; 1) LINCS covers not 
Just those fields which fall under a narrowly defined ."linguistics" but 
rather the whole range of fields in Tfhich language is an ImpPrtant factor, 
l*e«, the language sciences; 2) LINCS Is an Infotmatloo network and as 
such must be primarily concerned with meeting the infortnation needs of 
workers in the various fields of the language sciences* Theee two impor- 
tant factors Influmce both the scope and the structure of the classifi* 
cation system presented beloif* 

Scope^. Because the LINCS system attempts to cover the whole range of tte 
language sciences^ the classification system must Include a vide range of 
fialds* The present classification does this by an initial four part 
pragmatic division of the field into (1) Core Linguistics, which Includes 
the traditional fields of linguistic endeavor; (2) hybrid Linguistics, 
which Includes those fields where linguistics Interacts with another 
field of knowledge such as Sociology, Psychology or Mathmatlca; (3) 
Related Fields, which Includes those non-language fields wbere develop- 
ments may have Important consequences for the Language Sciences; and 
(4) Languages. 

Meeting User Needs * The second Important point is that the classification 
system presented belou is designed to meet the needs of the users of 
LINCS* The field of the Language Sciences has, accordingly, been defined 
not in terms of Intellectually or theoretically established hierarchies 
but rather in terms of the literature in the language sciences in so far 



22 



27 



as it reflects the work and the interests of researchers and scholars. 
Thus J the classification gives prominence to those fields which are 
prominent in the literature currently being produced* 

Of great value in determining current fields of interest were the LIHCS 
P^.eference Groups (see Chapter 4) and Priscilla Rose's Linguistic BiblioR " 
raphy Count [3]* The latter work was especially useful in deciding which 
fields in the classification required detailed hierarchical breakdoims* 
Thus J a field like onomasticSj ^ohich in the 1566 Linguistic Bibliography 
was represented by'only 38 entries^ would not seem to require^ for present 
purposes^ the extensive breakdow provided by the Trager classification 
system^ whereas fields like the Linguistic Bibliography 's •'Mathematical 
Linguistics J " which is represented by 151 entries^ would certainly seem 
to demand fiirther breakdowns^ such as provided in the preliminary classi- 
fication outline presented beloWj where this area is covered by "Mathemati- 
cal Linguistics" and "Language and Automation" and their subfields* 

Response to user needs was also an important consideration in those, in- 
stances uhere^ because of the prominence ,of certain fields^ they are 
given equal status "rfith other fields to which they might actually seem 
subordinate* Thus 'Teaching English as a Second or Foreign Language" 
might be thought of as a subgroup of 'foreign Language Education" but 
because of the importance of "leaching English as a Second or Foreign 
Language" as reflected in the large number of publications in this area^ 
It has been placed on the same level v^ath ''Foreign Langu^e Education*" 

The chief features of the pi'eliixtinary classification outline are^ then^ 
its broad scope^ and its attempt^to reflect the fields of 'the' language 
sciences as represented in published literature* Since these are the 
requirements of the LINCS system^ it is hoped that the present classification 
will be adequate to serve as a basis for work on the LINCS Thesaurus as 
well as for work on a more detailed classification for the language 
sciences* 

^' PrglAminarj r classification outline 

[COEE LBTGUISTICS] 

THEORErlCAL AtlD DESCRIPTIVE LUsWSTICS 

Phonology , , ' 

Segmental Phonology 
Phonetics 

Acoustic Phonetics 
Articulatory Phonetics 
Phonemic s 

Distinctive Feature Analysis 



28 



Prosody [Suprasegmental Phonology] 
Loudness^ Stress^ Amplitude 
Timing (length, Jthythm) 
Pitch (Intonation, Tone) 
Combinatory Phenomena (Emphasis, Juncture, 
Syllabification) 

Grammar 

Morphology 
Syntax 

Hor phophonemic s 
Discourse (Analysis) 

Itezicon 

LeKlcology and Lexicography 

Etymology ^ 

Onomastics 

Sentantics 

Structural Semantics 
Semantic Theory 

Or t hogr aphy /Braphemics 



COKTRASEIVE LIHSOISTXCS 

Theories of Contrastive Linguistics 
Error Analysis 
ContrastiVG Analysis 



CCSIPARATIVS AMD illSXORICAL LINGUISTICS 
Processes of Language Chaago 
Language Reconstruction (Comparative ilethod) 
Areal Linguistics 

LANGUAGE CLASSIFICATION 
LftNGW^GE, tfiJpnJRSALS 

LiriGUiSTIC THEORIES 

T rans f ormst iona lism 
Stra?:ificationalisci 
■ Tagmeiaics 
Case Graoimar 

Prague School and Neo-Praguians 
American Structurali^ 
Other 



24 

29 



HISTORY OP LItTGUISTICS 



EHrnXD/inrFHENAIED LIKGUIgriCS] 

LANGUAGE AND BEHAVICR 

Theories of Verbal Behavior 

FsychoXingul sties 
Intellection 

Cognition 

Memory and Recall 
Child Language 

Frellngulstlc Vocalization 

Development of Language in the Individual 
Fsychoacoustics 

Bio linguistics 

NeuroXlnguistics 

Pathologies of Language Behavior 
' Aphasia 

Kon^aphasic speech pathology 
Non-aphasic dyslexia 
Psychopatho logy 

Psycholinguistic Aspects of Bilingualism 



LANGUAGE A^^) EDUCATION 

Language Loamlng and Teaching (General) 
Theory of Language Learning /reaclii]% 
Physiology and Psychology of Language Learning 
Technology of Language Education 
Audiovisual Teclmigues 
rrcgrammod Learning 

SeIf"InstructionaI Techniques and 
Materials 
Teaching Methods 
Language Laboratories 

Evaluation of Langu^e-Leamirg Technologies 
Methodology (Other than "technology of Language Education-") 
Teaching Materials (Other than 'technology Language 

Education^ ^') 
Language Testing 

Achievement 

Aptitude 

Proficiency 



25 



30 



Curriculum Studies 
Teacher Education 

Analysis and Teaching of Cross-Cultural Context 

Foreign Language Education [See also language Iteartilng 
and Teaching*"] 

Teaching Engllch as a Second or Foreign Language (See also 
'*Lai}guage Learning and Teaching."] 

Native Language Teaching [See also '^Language Learning and 
Teaching* " 

Language Arts 

Social Dialects and Education 

Standard Dialect for Speakers of Other Dialects 

Bilingual Education 



LA1IGUA(K AMD SOCIETY 

Sociology of Language [Fishman's MacrosoclolingulstlcsJ 

National Language Situations 
Language Planning 

Language Policies 

Language Standardization 

Ethnic Minority Problems 
literacy 

Blllnguallsm as a Group Phenomenon 
Description 
Theory 

Languages in Contact 
Dlglossla 
Bidalectlsm as a Croup Phenomenon 

Soclolingulstlcs [Flshman's Mlcrosoclollngulatlcs] 

Social Dialect Description 

Small Group Communication 

Technical and Other Functional Styles 

Blllnguallsm as an Individual Phenomenon 

Description 

Theory 



26 



31 



LANGUAGE AND CULTURE [See also "Anthropology*") 

Linguistics and Anthropology 
Ethnolitigui sties 
Ethnography of Communication 

DIALECTOLOGY 

Linguistic Geography 

Linguistic Atlases 
Dialect Descriptions 

LINGUISTICS AHD THE HUMANITIES 

Linguistics and Literature 
Styli sties 
Content Analysis 

Litiguistics and Other Huiaanities 

PHILOSOPHICAL LINGUISTICS 

MATHEMATICAL LINGUISTICS 

Mathematical Mcdelc: in Linguistics 
Quantitative Ling^jistica 

LA^TGUAGE AND AUTOMATION 

Computational Linguistics 

Automatic Language Processing 
CcHDputer Aids to Linguistic Analysis 
Mechanical Translation 

Linguistics and Information Science 

Man-Machine Communication and Artificial Intellig 
TRANSLATION 
SEMIOTICS 



27 

32 



[SBUXED FIELDS! 



PHOKETIC SCIENCES (See also "Phonetics- "1 
.PSYCHOLOGY 

Cognitive Psychology (See also "Cognition*"] 

Developmental Psychology (See also ''Development of Language 
In the Individual-"] 

Educational Psychology (See also "Language Learning and 
Teaching."] 

Psychology of Perception (See also '*Psychoacoud.tl'fcs*,"l, 

BIOLOGY (See also "Blollngulstlcs. "] 

Speech Physiology 
Hearing Physiology 

liSDICINE AND THERAPY 
EDUCATION 

SOCIOIjOGY 

Socloeconoodc Studies 
AITTHROPOLOGY (See alao "Language and Culture-"] 

Cogaitlve Anthropology 
Social Anthropology 

POLITICAL SCIEKCE 

Ethnic Minority Problema 
GEOGRAPHY 

Demograplry 
MATHEMATICS 
COMPUTER SCIENCE 



28 

33 



INFOBMAIION PROCESSING AND DOCUMENTATION 
• INFORMATION AND COMMUNICATION THEOBY 

PHILOSOPHY 
HUMANITIES 



Literature 
Hu8lc 



LANGUAGES* 

BtDO-HirrrCE MACRO-PmilM 

Anatolian Fantily 

Indo-European Phylum 

Albanian Isolate 
Armenian Family 
Baltic Family 
Celtic Family 
Germanic Fsa'ily 
Hellenic Family 
Illyrlon Family 
Indlc Family 
Iranian Family 
Italo-Romance Family 
Slavic Family 
Tocharian 

UEALIC-ALTAIC laCRO-PBYLUM 

Uralic Phylum 

Flnno-Ugrlc Family 
Samoyedlc Family 



^he outline classification for Languages was prepared by Charles Zisa, 



• 



ERLC 



29 



34 



Altaic Phylum 

Korean Isolate 
Mbngollan Family 
Tungu&lc Family 
Turkic Family 

APRO-ASIAXIC IttCRO-PmUH 

Barber Family 
Chadlc Family 
Cushltlc Family 

Hamltlc (Egypto-Coptlc) Family 
Semitic Family 

ATISTEALIAK MACRO-PHXLUH 

SINO-TIBETAN MACRO>-PHYLUH 

KaEij'Thai Family 
Sinnltlc Family 
Tlbeto^Burman Phylum 

ATISTRONESIAH MCRO-PmUM 

AFRICAN LANGUAGES 

Nlger^Congo Phylum 
Adamaua'^Eastem 
Central (B^mtu) 
Gur 

Kordofanlah 
Kwa 

t7e&tern Atlantic 

Nllo-Hamitlc Family 

Hllo-Saharan Phylum 
Charl-Nlle 
Sudanlc 

Kbol&an (Bu&hman**Hottentot) Phylum 



ERIC 



30 

35 



AMERICAM IMDIAK lANGUAGES 



Algonquian Macro^Pbylum 
Andean-Equatorian Macro-Phylum 
Azteco-Tanoan PhyXura 
Chibchan Macro- Phylum 
Ge-Pano-Carib Macro-Phylum 

Hokan Phylum 

Na-Dene PhyXym 

Oto-Manguean Phylum 
Slouan Macro-Phylum 

Ungrouped jAmerlndian Languages and Groups 

CAIK^^SIAN UNGUAGES 

Korth Caucasian Phylum 
South Caucasian Family 

PAPUAN LANGUAGES 

SOUTHEAST ASIAM LMIGUAGES 

Andamanese Languages 
Jalotnic Family 
Sakaic Family 
Salweenic Family 
Semangic Family 
Vietnamic Family 

BASQUE FAMILY 

DRAVIDIAN FAMILY 

ESKIMD-CHUKCHEE PIKLUM 

MUMDA FAMILY 

NIPPONIC (JAPAHESE-OKIHAWAN) FAMILY 
PALEO-SIBERIAN PmUM (AINU-GILYAK^ ItET^ YUKAGHIR) 
PIDGItl AMD CREOLE LANGUAGES 
UNGROUPED LANGUAGES 



31 

36 



References 



[1] Gifford, Carolyn. A survey of indexing, tools for the lanRuaRe 
sciences , CALLINGS- 70-6, HSF GN-771. Washington, D.C. : Center 
for Applied LinguisticSj 1971. 

[2] Lewis, Kathleen P., comp. Indexing tools and terminology sources in 
the languages sciences: A bibliographical Usting. LINCS ^?2-6S, 
HSF GN-653. Washington, D.C: Center for Applied Linguistics, 1968, 
20 p* [ERIC: ED 02 1 245] 

[3] Rose, Prificilla, Linguiatic bibliography count . LIHCS #10-70, 

HSF GN-771. Washington, D.C: Center for Applied Linguistics, 1971. 

[4] Tragerj G.L. "A bibliographical classification fiyetem for linguisti 
and languages." Studies in Linguistics ^ 1945, 3:54-108. 



I? 

37 



Chapter 4 



vOCABUURY COtlTROL FOR THE UNCS REFEREKCE MAKAGEMEMT SYSTEM (RMS) 
By Altrred Pietrzyk 

This outline summarizes the initial indexing approaches and authority 
file management techniques whichj at this tlmCj are considered to be 
optimal for use in the proposed Reference Management System i'RMS)^ 
the automated central clearinghouse and secondary processing facility 
of LINCS. Figure 1 shows the general configuration of the envisaged 
RMS. Most of the modules (1*6) will in some way be effected by co- 
ordinated vocabulary control techniques. The emphasis of this outline 
is on Module 6 for authority file ciarr^rg^ement. If these Rlans , are - 
actually implemented^ several modifications will no doubt turn out to 
be desirable, 

1. Indexing 

Human indexing at the input processing stage (Module 1) will be dynamic 
standard terms from the RMS authority files (thesaurus descriptors and 
language namtaSj broad subject category terms^ and auxiliary terms) vrf.ll 
be used in conjunction with identifiers^ i,e,j current natural language 
terms and context-preserving phrases based directly on the source in- 
formation and/or its surrogates. Reference units (document surrogates) 
will be indexed to an average of 8-10 terms, 

2, Authority File Management 
2, 1 Baseline 

With a view toward effective vocabulary control for reference materials 
in the language sciences^ the LIUCS program has completed important pre 
liminarieSj with the follovrf.ng overall findings and results; 

- The indexing philosophy of LXKCS must be dynamic^ i,e, 
both controlled and uncontrolled open-ended vocabulary 
must be used at the human indexing stage in a carefully 
combined approach in order to ensure 

* high recall in search operations by using controlled 
generic thesaurus terms and controlled broad subject 
category terms; * 

- high precision in search operations by using both 
controlled specific tliesaurus terms and uncontrolled 
specific terms and phrases extracted from natural 
language text 5 

^ compatibility with structured indexing tools of 
cooperating information processing and service 
organizations; 

33 



38 



Modu le 1 

Input 
processing 



Module 7 

Communica* 

tion/ 
control 



Modulo 2 

Internal 
processing 



Module 6 

Authority 

file 
management 



I OUTPUT PROCESSING 





Module 3 

Tape 
dissemimi** 
tion 



Module 4 

Search and 
retrieval 



Module 5 
Publication 



Fig. 1. LINGS REFERENCE MANAGEMENT SYSTEM (RMS), GENERAL CONFIGURATION - 1974 



- currency of the indexing vocabulary^ 
comprehensive coverage; 

- preservation (in indexing phrases) of syntactic 
contexts with high information content* 

The folloijing prnllminary jiraftc.of indexing tools and 
source materials have been completed or acquired: 

' an experimental sample thesaurus for LINCSj 
prepared by Joy Varley of the LINCS staff; 
the thesaurus contains some 450 unique technical 
terms (descriptors) in the language sciences^ 
lAth considerable specificity in one subfield 
(phonology)j structured in accordance fdth 
COSAtI guidelines^ including items under USEj 
USE3) FOR <UF)j BROADER TERM (Br)^ NARROWER 
TERM (tn;)j RELATED TERM (BI)j a'^d SCOPE NOTE 
(SC)j with hierarchical display of narrower 
terms to a dapth of five levels (see Figure 2)i 

- a preliminary classification outline (eee 
Chapter 3); 

*• a comprehensive coded list of some 5jOOO unique 
language and dialect names (17^000 entries in- 
cluding synonyms) prepared by the CAL for NSF's 
National Register of Scientific and Technical 
Personnel (the codes cover generic sets)j 

** a detailed classification of American Indian 
languages (954 unique itemSj 3^730 entries 
including synonyms); 

- a listing of some 190 broad subject category 
terms under 46 reference group headings (see 
Table 1); 

- 18 controlled auxiliary terms describing document 
type and status (e.g. "dictionaryj" "revision"); 

- a comprehensive collection of existing thesauri^ 
mlcrothesaurij technical dictionaries^ indexes^ 
and classifications relevant to the language 
sciences^ useable as source materials for thesaurus 
coajitruction (not suitable for direct use in the 
proposed RI^S) 

A limited capability for automated thesaurus display (see 
Figure 2) has been assembled on an experimental basis* 

35 



40 



WHISPERED VOWELS 
BT VOvvELS 

segmental phonemes 
phokehes 
phonology 
rt whisper 



WORD CLASSES 

UF PARTS OF SPEECH 

BT DESCRIPTIVE (STRUCTURAL) LINGUISTICS 
*NT ADJECTIVES 
NOUNS 
VERI^S 



WRITIK'G 

SC THE MECHANICS OF WRITING 
BT LITERACY 

*NT ORTHOGRAPHY 
SPELLING 

SPELLING REFORM 
SDUNO-SPELLl NG CORRESPONDENCES 



ViRITING SYSTEMS 
USE ORTHOGRAPHY 



Fig. 2. LINGS THESAURUS EXCERPT (UNE3CPANDED PRELIMINARy DRAFT) 



ERIC 



41 



A aeries of LIKCS reports deals irf.th preliminaries to thesaurus con- 
struction and maintenance; indexing options, and classification prin- 
ciples (see References, Part Two, final LHICS project report), iiif 
Significant practical experience was gained in the machine-aided 
production of permuted subject indexes for the experimental reference 
serial Language and Autocu^^tiop , 

For purposes of the prooosed BMS, the following requirements remain un- 
fulfilled; 

- All authority files must be improved, modifie<Jf, and 
integrated to accommodate precisely all human 4nd 
automated processing requirements in CHS Modules 
1-5, including requirements for compatible interfaces 
with decentralieed collaborators. 

- The LIBCS thesaurus must be refined and e.,panded to 
achieve comprehensive coverage of technical terms 
(descriptors) as uell as language and dialect names 
needed in EMS processing (the current draft *avsion 
does not include language names). 

- The authority files for broad subject category terms 
and auxiliary terms must be improved and expanded for 
comprehensive coverage. 

- Human and automated procedures for authority file 
construction and maintenance must be fully specified. 

- The existing limited automated aids for thesaurus 
processing must be re-*designed for the increased, 
more complex requirements listed above, in order 
to ensure accurate and prompt maintenance of all 
authority files needed in the EMS. The current automated 
capability is uneconomical; the proprietary pro- 
gram now in use cannot be modified to include the 
required input/edit/update functions and additional 
display formats required minimally for efficient 
authority file management. 

- The initial design of integrated authority file 
manag«nent approaches must be open-ended for 
future automation refinements (see Figure 3). 

2.2 Oblective for 1974 

Module 6 will be an operationally ready subsystem for computer-sup- 
ported authority file management in the language sciences, with 



^Center for Applied Linguistics. An information system program for 
the language sciences: Final project report, MSF Grant (a?"771 . 
CAIXINCS-71-4. TJashington* D.C.: Center for Applied Linguistics* 
1971. 

37 

42 



Term 
generacion 



Keyboard 



I Short- 

I term 

1 file 
I 

< Mod, 2 

1 
I 



Not included in 
RMS 1974. 



Correct 



Proof 



1 



Program :rl2 

Input/ edit/ 
update 



f 



Authority 
files 



Thesaurus. 

Broad J 
Auxiliary 

I 









^ 




V 


Program #13 
Sort/ list 




Computer 
printouts 


'Si 





I Match/ 
''j validation 



r 
I 



Ifpdate 



1 



Format for 
photocomp* 
Mod. 5 



Photo- ^ 
comp • < 
file / 




Keyboard 



1 

4 
( 
I 
I 



Hierarchy 

^ generator 
I 
1 



[ Computer 
printouts 



— 1 

t Printouts J ; 

I for thesaurus, ^ 

t term selec-^I / 

» tion 



I 



Fig. 3, LINCS RMS MODULE 6: 

AUTHORITY FILE MAKAGEMENTj 1974 
(Prepared in consultation with 
Set SystemSj Inc*) 



specific application to vocabulary control needs of the EKS and its 
input and output processing interfaces (Figure 3)* Its main functions 
will be: 

- to provide comprehensive desk-top tools (E>eriodically 
updated computer printouts of authority files) for 
vocabulary control - including thesaurus control * in 
centralised and (standardized) decentralized human 
indexing at the input processing stage (Module 1); 

- to provide machine-readable terns needed for automated 
validation of broad subject category terms and auxiliary 
terms in reference file maintenance (^bdule 2); 

- to provide vocabulary control - including thesaurus 
control - in the formulation of search strategies 
(^lodule 4); 

- to provide vocabulary control - including thesaurus 
control - in the extractingj indeXj and sortiug 
operations of the publication subsystem (Module 5); 

- to utilise human operations for the continuous mainten- 
ance of all EMS authority files (term generation and 
structurlngj keyboardingj edltinjj updating, proofingj 
correction^ and preparation of desk- top' tools and tapes 
for use in other EMS modules)* 

- to provide a minima Ij economical capability for con-* 
tinous computer- supported processing and maintenance 
of the tbesaurusj broad subject category terms^ and 
auxiliary terms (including input/edit (validate) /update 
functions^ master file storage^ and sort /list/print in 
thesaurus and other formats required by the ICMS)« 

Additional functions xAll be added after 1973 (see Figure 3). 

Following their initial construction in the RMS project^ all authority 
files will be maintained in regular update cycles* Intellectual efforts 
will concentrate on term extraction froai current sources and term struc- 
turing in accordance with COSATI guidelines modified for EMS purposes* 
The folloTd^ng improved^ fully expanded machine- read able authority files 
TTill be available by 1974: 

- a comprehensive^ structurally refined language sciences 
thesaurus based on COSATI guidelines (cf^ Figure 2) con- 
taining about 5jOO0 uni^xue descriptors (technical term?) 
and about 5^000 unique language and dialect names; 



39 



44 



a comprehensive file of about 400 Improved broad subject 
category terma used In packaging of outputs and coopera- 
tive eSEchanges of Inputs (these broad terms will also be 
Included In the thesaurus); 

about 30 Improved auxiliary terms (initial sample only 
by 1974) used as dcrcumont type and status descriptors. 

The results will include full specifications of human and automated pro* 
cedures for authority file construction and maintenance* The module as* 
sembly will include an economical computer facility (tSi 360/30)* 

The authority file capability will consist of the following main function 
al flow components (Figure 3)* 

- intellectual processing of new or revised authority file 
terms* term collection from current sources^ visual match* 
ing against escisting flles^ structuring and formatting for 
automated input processing; 

- off-line keyboarding of term inputs on magnetic tape 
recording typewriter (including off-line machine-aided 
proofing and correction); 

- computer input and partial machine validation of new 
termSj also error listing and maintenance of processing 
statisti^rs (RMS program ^12); 

- updating of machine- readable master files for thesaurus^ 
broad terms, and auxiliary terms (program #12); 

sorting of master file subsets and display (printout) 
including thesaurus format^ alphabetical listing, and 
permuted term format (program #13); 

- proofing, correction and re-entry of corrected terma via 
keyboarding and input /edit /update components; 

- preparation of partial or comprehensive machine**readable 
files for use in ItMS file maintenance (Module 2) and out-^ 
put processing (Modules 3-5)# 



45 

40 



Table 1. REFERENCE GROUPS IK THE LANGUAGE SCIENCES* 



The 46 user-oriented reference groups listed below have been established 
on the basis of operational criteria including the productivity of pub* 
lished research in given areas. Pragmatic criteria prevail over intel- 
lectual and taxomonic principles. Together^ the reference groups cover 
the entire spectrum of the language sciences. The subject categories 
given for certain reference groups are illustrative rather than exhaustive. 
The subject categories are listed approximately in accordance with their 
relative importance in a reference group. Certain categories of primary 
importance in one reference group re-occur as secondary categories in 
other reference groups. The general linguistics group cuts across the 
entire set of referrace groups. However^ services in this category 
involve, in part, a non-^overlapping subset of the total audience. The 
number of potential LIHCS users in 1976 estimated for each reference 
group incluuGS only those users with a primary interest in the group 
involved, i.e., all figures listed are non*overlapping. Specific ser- 
vices focused on various reference groups will, of course, be offered 
to wider audiences. Likewise, the number of message units (articles, 
books, etc.) expected in 1976 has been estimated in each case only for 
material of focal interest. Given services T^ll> however, include 
selections from other reference groups. The reference group concept 
is dynamic; it will be continously refined and modified in the light of 
changing user requirements, ^vlce from the community, and newly evolving 
research and publication patterns. 



Reference Group 



Total no. 
of users, 
1976 



Total no, of 
message units 
(articles, ^ books, . 
etc. ) 

1976 



1 GENERAL LINGUISTICS 

History of linguistics 
Theoretical linguistics 
Descriptive linguistics 
Historical linguistics 
Other language sciences 
All language groupings 



18,150 



i,870 



^ Prepared in collaboration vLth Joy Varley and other members of the 
LINCS staff, as well as consultants specializing in various subfields 
of the language scelnces. 



ERIC 



41 

46 



Total no* 
of users ^ 
1976 



Total no, of 
message units 
(articles^ books^ 
etc* ) 



Reference Grou p 



1976 



2 PHOKEIIC SCIENCES 



9^320 



1^680 



Acoustic phonetics 

Physiological phonetics 

Perceptual phonetics (speech perception) 

Descriptive phonetics 

Historical phonetics 

Statistical phonetics 

Phonology /^honemlcs 

Autotnatlc ^eech analysis and 

synthesis 
Phonetics and communication sciences 
Psychoacoustics 
Phoniatrlcs 
IfOgopedlcs 

3 THEORETICAL AHB DESCRIPTIVE 



Foundations of linguistics 

'^Schools" of linguistics 

Theory of phonology 

Theory of writing 

Theory of grammar 

Semantic theory 

Language universals 

Formal and matheraatlcal linguistics 

Llngtilstlfi methodology 

Descriptive linguistics (principles) 

Historical linguistics (principles) 

Linguistic /pf^logeny 

Linguistic ontogeny 

Typology of languages 

Linguistics and logic 

Linguistics and philosophy 

Other language sciences 

History of linguistics 



■LINGUISTICS 



12^870 



2^100 



47 

42 



Total no* 
of users^ 
1976 



Total no* of 
message units 
(articles, books, 
etc*) 



Reference Group 



1976 



4* LEXICOLOGY AND LEXICOGEAPHy 



2,365 



390 



Lexical theory and applications 
Monolingual dictlonavles 
Bilingual dictionaries 
Bldlalectal dictionaries 
Multilingual dictionaries 
Etyciological dictionaries 
£1 lingua 11 sm 

Specialized terminologies 

General thesauri 

Information retrieval thesauri 

Lexical planning 

Etjn^iology 

Automatic dictionary lookup 
AutcMatlc dictionary publishing 
Theoretical and descriptive linguistics 



5* HISTORICAL LINGUISTICS AMD CLASSICAL 

LANGUAGES 4,830 640 

Dlachronic linguistics (theoretical 

and descriptive) 
Comparative method 
Glottochronology 
L^lcostatlstlcs 
I?roto* language reconstruction 
Classical languages 



6 LINGUISTIC GEOGIlAPHy 3,220 960 

Dialectology 
Linguistic atlases 
Dialect descriptions 
Censuses 
Onomastlcs 
Bl lingua llsm 



43 



ERIC 



48 



Total no, 
of usersj 
1976 



Total no* of 
message units 
(articles^ books, 
etc. ) 



Retference Group 



1976 



LAHGUAGE AND CUtXlp; 



11,590 



2,390 



Linguistics and anthropology 

Ethnolingui sties 

Cognitive anthropology 

Ethnographic seniantics 

Ethnograpt^ of cooununication 

Sociolingui sties 

Speech coomuinities 

Area studies 

Culture history 

Language atul mission wrtc 

Literacy 

8 SOCIAL DIALECTS AHD EDUCAIIOH 11,020 1,410 

Microsocio linguistics 

Social dialect description 

Bidialectalism 

Psycho linguistics 

Small group comnmnication 

Ethnic minority dialects 

Standard dialects for speakers 

of other dialects 
Technical and other functional styles 
Social anthropology 
Social psychology 
Socioeconomic studies 
Sociology 

9 LAKGUAGE PROBLEMS AND LAK3UAGE 

PLANNING 4,050 790 

Macro socio linguistics 

National language situations ^ ^ 

Language planning 

Language codification (standardization) 
.Xinguistic innovation and borrowing 
Orthograpt^ 
Orthoepy 

Language policies 



ERLC 



44 

4d 



Total no, of 
message units 
Total no* (artlclea^ books^ 
of usera^ etc«) 
Sefereace Group , 1976 1976 

Literacy - 
Language maintenance and shift 
Ethnic minority prcblsns 
. . Billngualism 

Hulti linguali sm 
Specialised terminologies 
Languagea of idder ccmmunication 
Second language Icamizig 
Artificial languages 
Pidgins and creolea 

10 BILINGUALISM 11,690 1, 240 

Bilingualism theory 
Eilingualism description 
Languages in contact 
Contractive linguifitics 
Diglossia 
Hulti llngualism 
Bidialectalism 
Linguistic borroidng 
Language and culture 
Fsycholingui s ti cs 
Laxiguage problems and language 
planning 

11 CCaiTEASriVE LINGUISTICS 8,000 1,040 

Theory of contrastive linguistics 
Contrastive analyses 
Error analyaes 
Bilingualism 

12 FOREIGN AM) SECOND LANGUAGE 

EDUCATION 29,540 3,530 

Language teaching methodology 
Phyaiology and psychology of language 
learning 

Technology of language education 



45- 

50 



Total no, of 
message units 
Total no, (articles^ books ^ 
of uders^ etc, ) 
Reference Group 1976 1976 

Language ability testing 
Teacher education 
Teaching materials 
Curriculum studies 
Program evaluation 
Language aptitude testing 
Analysis and teaching of the 

cross-cultural language context 
Psycho linguiatics 

13 TECHNOLOCT OP LANGUAGE EDUCATION 10,215 1,410 

Audiovisual techniques 

Prograimned learning 

Self- instructional techniques 

and materials 
Teaching machines 
Language laboratories 
Tape collections 
Evaluation of language* learning 

technologies 
Psycho lingui s t ics 
Language and culture 
Language and automation 

14 LANGUAGE AND BEHAVIOR 13,885 2,780 

Psycho linguistics 
Verbal behavior 

Linguistics and cognitive psychology 

ITeurolingulstics 

Psychoacou sties 

Language and the child 

Biolinguistica 

Pathology of language 

Psychology of perception 

Psychology of learning 

Developmental psychology 

Psychometrics 

Educational psychology 

Special education 



51 

ERIC 



Total no* 
of users ^ 
1976 



Total no, of 
message units 
(articles^ books, 
etc*) 



Reference Group 



1976 



15 LINGUISTICS AKD MEDICINE 



33,355 



4,470 



Speech physiology 

Speech pathology 

Hearing physiology 

Hearing pathology 

Aphasia 

Dyslexia 

ITeuro linguistics 

Language and mental health 

Biolingiiistics 

Psychiatry 

Psychopathology 

Phoniatrics and logopedics 

Otolaryngology 

Audiology and audiometries 

Human conunnnicatlon disorders 

Comtiaimi cation of the blind 

Medical terminology 

Language education of the handicapped 

16 LINGUISTICS AND THE HUMANITIES 4,730 700 

Language and literature 

Linguistics and philology 

Linguistics and poetry 

Sty lis tics 

Rhetoric 

Sty lost a ti sties 

Content analysis 

Classical and mediaeval studies 

Linguistics and music 

Linguistics and other humanities 

Language and culture 

Mass coanminication 

17 LAHGUAGE AND AUTOMAIION 12,960 1,660 

Computational linguistics (automatic 

language processing) 
Quantitative linguistics 



ERLC 



47 

52 



Total no* 
of ucers. 



Total no^ of 
message milts 
(articles, books, 
etc*) 



Reference Group 



19V6 



1976 



Miechanlcal translation 
Uachlne^aided language learning 
Linguistics and cooaputer science 
Theoretical and descriptive linguistics 
Autoiaatlon In the. hunianltles and 

social sciences 
Artificial Intelligence 
Man-^achlne coimrunlcatton 



Theory of signs 
Paralinguistics 
FroxemicE 
Klneslcs 

Human .conHounlcatlon 

Animal communication (zoosemlotlcs) 

Ethology 

Anthropology 

19 TRAMSLKEIOM 15,490 1,600 

Human translation theory 
Human translation applications 
Theory of machine translation 
Machine**aided translation 
Lexicology and lexicography 
Dictionaries 

Specialised terminologies 
Sociolingui sties 

20 ONOM&STICS 1,160 250 

Anthroponyisjr 
Toponyiay 

Lexicology and lexicography 



18 SEMIOTICS 



5,210 



1,300 



ERIC 



48 

53 



Reference Group 



Total no, 
of usersj 
1976 



Total no. of 
message units 
(articles, books, 
etc.) 

IS'76 



21 FROtCB 21,050 

22 IBERIAN UHGUAOES 21,400 

23 ITALIAN 3,515 

24 Q^GLISB LINGUISTICS 25,720 

25 ENGLISH AS A NATIVE LASGUAGE 28,100 

26 ENGLISH FOR SPEAmiS OF OTHER 

LANGUAGES 12,640 

27 GERMAN 9,470 

28 SCANDmVIAN 1,475 

29 SLAVIC AND BAIIIC 2,005 

30 LANGUAGES OF THE SOVIET UNIOH 2,520 

31 RUSSIAN 6,660 

32 URALIC 960 

33 AIIAJC 1,115 

34 SOOTH ASIAN 1,875 

35 SCtTTHEAST ASIAN 3,290 

36 CHINESE 6,560 

37 JAPANESE 5,075 

38 AFRO-ASIATIC 4,730 

39 LANGUAGES OF SUB-SAHARAN AFRICA 6,700 

40 MALABO-POLYNESIAN 1,250 



2,770 
2,740 
550 
3,120 
3,940 

1,750 

1,410 
560 
510 
490 

l,0fw> 
150 
150 
240 
500 

1,240 
950 
810 
950! 
250 



ERIC 



49 

54 



Total no. of 
message units* 



Total no. (articles, books, 

of users, etc.) 

Reference Group 1976 ^976 

41 PACIFIC LANGUAGES 1,245 230 

42 AUSTRALIAN LANGUAGES 1,010 150 

43 NORTH AMERICAN INDIAN; ESKIMO 

AND ALEUT 1,915 300 

44 SOUEB AMERICAN INDIAN 2,780 650 

45 PIDGINS AND CREOLES 1,245 130 

46 ARTIFICIAL AND AUXILIARY LANGUAGES 8.505 1.040 

406,460 59,870 



55 

50 



